Shift-curvature, SGD, and generalization

نویسندگان

چکیده

Abstract A longstanding debate surrounds the related hypotheses that low-curvature minima generalize better, and stochastic gradient descent (SGD) discourages curvature. We offer a more complete nuanced view in support of both hypotheses. First, we show curvature harms test performance through two new mechanisms, shift-curvature bias-curvature, addition to known parameter-covariance mechanism. The shift refers difference between train local minima, bias covariance are those parameter distribution. These three curvature-mediated contributions reparametrization-invariant even though itself is not. Although unknown at training time, as well other mechanisms can still be mitigated by minimizing overall Second, derive new, explicit SGD steady-state distribution showing optimizes an effective potential but different from loss, noise mediates trade-off low-loss versus regions this potential. Third, combining our analysis with steady state shows for small noise, dominant mechanisms. Our experiments demonstrate significant impact on further explore relationship

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Theory of Deep Learning III: Generalization Properties of SGD

In Theory III we characterize with a mix of theory and experiments the consistency and generalization properties of deep convolutional networks trained with Stochastic Gradient Descent in classification tasks. A present perceived puzzle is that deep networks show good predicitve performance when overparametrization relative to the number of training data suggests overfitting. We describe an exp...

متن کامل

Improving Generalization Performance by Switching from Adam to SGD

Despite superior training outcomes, adaptive optimization methods such as Adam, Adagrad or RMSprop have been found to generalize poorly compared to Stochastic gradient descent (SGD). These methods tend to perform well in the initial portion of training but are outperformed by SGD at later stages of training. We investigate a hybrid strategy that begins training with an adaptive method and switc...

متن کامل

Spatial Peak Shift and Generalization in Pigeons

How pigeons generalize across spatial locations was examined in the 4 experiments reported in this article. During training, a square was presented at a fixed height at 1 of 2 horizontal locations on a monitor screen. One location (S +) signaled reward, whereas the other one (S ) signaled no reward. The birds were then tested occasionally with a range of locations. After training with S+ only, ...

متن کامل

Temporal generalization and peak shift in humans.

Three experiments investigated temporal generalization in humans. In Experiment 1, a peak shift effect was produced when participants were given intradimensional discrimination training. In Experiment 2, after training with a standard S+ and generalization testing with an asymmetrical series of durations, generalization gradients moved toward the prevailing adaptation level. In Experiment 3, ge...

متن کامل

Spatial generalization and peak shift in humans

Using a computer betting game, five experiments tested university students on spatial generalization and peak shift. On each trial, one location was marked and the subject was invited to bet 0–4 points. At the winning location (S+), bets won four times the points betted. At nearby losing locations (S)s), points betted were lost. Generalization gradients were exponential in shape, supporting She...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine learning: science and technology

سال: 2022

ISSN: ['2632-2153']

DOI: https://doi.org/10.1088/2632-2153/ac92c4